137 research outputs found
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
The emergence of large language models (LLMs) further improves the
capabilities of open-domain dialogue systems and can generate fluent, coherent,
and diverse responses. However, LLMs still lack an important ability:
communication skills, which makes them more like information seeking tools than
anthropomorphic chatbots. To make LLMs more anthropomorphic and proactive
during the conversation, we add five communication skills to the response
generation process: topic transition, proactively asking questions, concept
guidance, empathy, and summarising often. The addition of communication skills
increases the interest of users in the conversation and attracts them to chat
for longer. To enable LLMs better understand and use communication skills, we
design and add the inner monologue to LLMs. The complete process is achieved
through prompt engineering and in-context learning. To evaluate communication
skills, we construct a benchmark named Cskills for evaluating various
communication skills, which can also more comprehensively evaluate the dialogue
generation ability of the model. Experimental results show that the proposed
CSIM strategy improves the backbone models and outperforms the baselines in
both automatic and human evaluations
RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling
Retrieval-augmented language models show promise in addressing issues like
outdated information and hallucinations in language models (LMs). However,
current research faces two main problems: 1) determining what information to
retrieve, and 2) effectively combining retrieved information during generation.
We argue that valuable retrieved information should not only be related to the
current source text but also consider the future target text, given the nature
of LMs that model future tokens. Moreover, we propose that aggregation using
latent variables derived from a compact latent space is more efficient than
utilizing explicit raw text, which is limited by context length and susceptible
to noise. Therefore, we introduce RegaVAE, a retrieval-augmented language model
built upon the variational auto-encoder (VAE). It encodes the text corpus into
a latent space, capturing current and future information from both source and
target text. Additionally, we leverage the VAE to initialize the latent space
and adopt the probabilistic form of the retrieval generation paradigm by
expanding the Gaussian prior distribution into a Gaussian mixture distribution.
Theoretical analysis provides an optimizable upper bound for RegaVAE.
Experimental results on various datasets demonstrate significant improvements
in text generation quality and hallucination removal.Comment: Accepted to the Findings of EMNLP 202
- …